pacman::p_load(tmap, sf, sp, reshape2, ggplot2, ggpubr, tidyverse, stplanr, knitr, kableExtra)Takehome-Ex 3: Exploratory Spatial Data Analysis for Potential Push & Pull Factors of Locations in Singapore using Public Bus Data
Overview
In this exercise, my focus would be on prototyping exploratory spatial data analysis, particularly density of spatial assets within hexagonal traffic analysis zones. In conventional geography, a traffic analysis zone is the unit most commonly used in transportation planning models and the size of it varies. Hexagonal traffic analysis zones has gained traction as the hexagons of the study area have a uniform size which are easily comparable with each other when determining transport attractiveness. It is also recommended that hexagon radius should be 125m for areas in high urbanisation and 250m in areas with less urbanisation (Chmielewski et al., 2020).
The data preparation for this purpose also supports the preparation for calibrating spatial interaction models. For the prototyping, only exploratory spatial data analysis will be done.
Package Installation
I will load up the below R packages for the following purposes:
tmap: to visualise spatial data by creating thematic maps that could be interactive or static.
sf: to manipulate spatial data.
sp: to manipulate spatial data (older than sp).
reshape2: to reshape and transform data frames; converting data between “wide” and “long” formats using functions like melt().
ggplot2: to visualise data with plot types including bar plots.
tidyverse: to clean and transform data and contains sub-packagess including tidyr, dplyr and readr.
stplanr: to provide functions suitable for working with spatial transport data like networks, orgins-destinations matrices and travel time matrices; builds on the capabilities of sf and sp packages.
knitr: to combine code into a single documents that can be easily converted to various output formats like html, pdf, or Word.
kableExtra: to further style any Kables.
Data Preparation
URA’s Masterplan Subzone 2019 Layer in shapefile format
Bus Stop Locations extracted from LTA
A tabulated bus passenger flow for Nov 2023, Dec 2023 and Jan 2024 from LTA dynamic data mall
Population Data for 2023 from SingStat
Schools from MOE
Financial Services
Hospitals, polyclinics and CHAS clinics derived from Google Maps
Subzone Layer
Read the subzone layer
mpsz <- st_read("data/geospatial/master-plan-2019-subzone-boundary-no-sea-kml.kml")Convert mpsz to 2D geometry.
mpsz <- st_zm(mpsz, drop = TRUE) # Convert 3D geometries to 2DExtract the SUBZONE_N and PLN_AREA_N from the Dscrptn field
For SUBZONE_N,
mpsz <- mpsz %>%
rowwise() %>%
mutate(SUBZONE_N = str_extract(Description, "<th>SUBZONE_N</th> <td>(.*?)</td>")) %>%
ungroup()
mpsz$SUBZONE_N <- str_remove_all(mpsz$SUBZONE_N, "<.*?>|SUBZONE_N")For PLN_AREA_N,
mpsz <- mpsz %>%
rowwise() %>%
mutate(PLN_AREA_N = str_extract(Description, "<th>PLN_AREA_N</th> <td>(.*?)</td>")) %>%
ungroup()
mpsz$PLN_AREA_N <- str_remove_all(mpsz$PLN_AREA_N, "<.*?>|PLN_AREA_N")Remove the Description column
mpsz$Description <- NULLCreate the shapefile
st_write(mpsz, "data/geospatial/mpsz_sf.shp")Read the updated shapefile.
mpsz_sf <- st_read("data/geospatial/mpsz_sf.shp")Reading layer `mpsz_sf' from data source
`C:\guacodemoleh\IS415-GAA\Take-home_Ex\Take-home_Ex03\data\geospatial\mpsz_sf.shp'
using driver `ESRI Shapefile'
Simple feature collection with 332 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
Creating Spatial Grids
mpsz3414 <- st_transform(mpsz_sf, 3414)
outer_islands <- c("SEMAKAU", "SUDONG", "NORTH-EASTERN ISLANDS", "SOUTHERN GROUP")
mpsz3414 <- mpsz3414 %>%
filter(!str_trim(SUBZONE_N) %in% str_trim(outer_islands))hex_grid <- st_make_grid(mpsz3414, cellsize = c(750, 750), what = "polygons", square = FALSE) %>%
st_sf() %>%
# Apply as.factor() since index will be used as the identifier to link to other data sets
mutate(index = as.factor(row_number()))
# Create border of Singapore's land area
mpsz_border <- mpsz3414 %>% st_union()
# Clip the hexagon grid layer
hex_grid_bounded <- st_intersection(hex_grid, mpsz_border)# Check if hex grid intersects any polygons using st_intersects
# Returns a list of intersecting hexagons
intersection_list = hex_grid$index[lengths(st_intersects(hex_grid, hex_grid_bounded)) > 0]
# Filter for the intersecting hexagons
hex_grid_bounded2 = hex_grid %>%
filter(index %in% intersection_list)
tm_shape(hex_grid_bounded2) +
tm_polygons(alpha = 0.2) +
tm_basemap("OpenStreetMap")
- The map above now shows the complete analytical hexagon data of 375m (perpendicular distance between the centre of hexagon and its edges) that represents the TAZ.
Map over SUBZONE_N and PLN_AREA_N to hex_grid_bounded2.
joined <- st_join(hex_grid_bounded2, mpsz3414, join = st_intersects, left = FALSE)
aggregated <- joined %>%
group_by(index) %>%
summarise(SUBZONE_N = first(SUBZONE_N))
hex_grid_bounded2$SUBZONE_N <- aggregated$SUBZONE_N
hex_grid_bounded2 <- hex_grid_bounded2 %>%
mutate(index = as.factor(row_number()))
hex_grid_bounded2 <- hex_grid_bounded2[, c("SUBZONE_N", setdiff(names(hex_grid_bounded2), "SUBZONE_N"))]
tm_shape(hex_grid_bounded2) +
tm_fill(col = "SUBZONE_N",
alpha = 0.7)
aggregated <- joined %>%
group_by(index) %>%
summarise(PLN_AREA_N = first(PLN_AREA_N))
hex_grid_bounded2$PLN_AREA_N <- aggregated$PLN_AREA_N
hex_grid_bounded2 <- hex_grid_bounded2 %>%
mutate(index = as.factor(row_number()))
hex_grid_bounded2 <- hex_grid_bounded2[, c("PLN_AREA_N", setdiff(names(hex_grid_bounded2), "PLN_AREA_N"))]
tm_shape(hex_grid_bounded2) +
tm_fill(col = "PLN_AREA_N",
alpha = 0.7)
tmap_mode("plot")Bus Stop Locations
BusStop <- read.csv("data/aspatial/bus_coords_subzone_v2.csv") %>% st_as_sf(coords=c("Longitude", "Latitude"), crs=4326) %>%
st_transform(crs=3414)Compute Bus Stop Density
Using st_intersects(), we can intersect the bus stops layer and the hexagon layer and use lengths() to count the number of bus stops that lie inside each Traffic Analysis Zone. These count values are appended back to each spatial grid and encapsulated into a new column called busstop_count in a duplicated dataframe, hex_grid_bounded2.
hex_grid_bounded2$busstop_count <- lengths(st_intersects(hex_grid_bounded2, BusStop))tm_shape(hex_grid_bounded2) +
tm_fill(col = "busstop_count",
palette = "Blues",
style = "cont",
title = "Bus Stop Density") +
tm_borders(col = "grey") +
tm_legend(position = c("RIGHT", "BOTTOM"))
tail(hex_grid_bounded2,10)Simple feature collection with 10 features and 4 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 49542.54 ymin: 30108.73 xmax: 56667.54 ymax: 38768.98
Projected CRS: SVY21 / Singapore TM
PLN_AREA_N SUBZONE_N geometry index
1664 CHANGI BAY CHANGI BAY POLYGON ((49917.54 34005.84... 1664
1665 CHANGI BAY CHANGI BAY POLYGON ((49917.54 36603.92... 1665
1666 CHANGI CHANGI AIRPORT POLYGON ((49917.54 37902.96... 1666
1667 CHANGI BAY CHANGI BAY POLYGON ((50292.54 33356.32... 1667
1668 CHANGI BAY CHANGI BAY POLYGON ((50292.54 37253.44... 1668
1669 CHANGI BAY CHANGI BAY POLYGON ((51042.54 37253.44... 1669
1670 CHANGI BAY CHANGI BAY POLYGON ((51417.54 36603.92... 1670
1671 CHANGI BAY CHANGI BAY POLYGON ((54417.54 30108.73... 1671
1672 CHANGI BAY CHANGI BAY POLYGON ((55542.54 33356.32... 1672
1673 CHANGI BAY CHANGI BAY POLYGON ((56292.54 33356.32... 1673
busstop_count
1664 0
1665 1
1666 0
1667 0
1668 0
1669 0
1670 0
1671 0
1672 0
1673 0
Constructing O-D Matrix of commuter flow
od_bus_nov <- read_csv("data/OD Bus/merged_bus_nov_v2.csv")
od_bus_dec <- read_csv("data/OD Bus/merged_bus_dec_v2.csv")
od_bus_jan <- read_csv("data/OD Bus/merged_bus_jan_v2.csv")OD <- rbind(od_bus_nov, od_bus_dec)
OD <- rbind(OD, od_bus_jan)
nrow(od_bus_nov) + nrow(od_bus_dec) + nrow(od_bus_jan) == nrow(OD) # evaluates to TRUEstr(OD)ORIGIN_PT_CODE and DESTINATION_PT_CODE are listed as character variables. These variables should be transformed to factors so that R treats them as grouping variables.
cols_to_convert <- c("ORIGIN_PT_CODE", "DESTINATION_PT_CODE")
OD[cols_to_convert] <- lapply(OD[cols_to_convert], as.factor)
glimpse(OD)ORIGIN_PT_CODEandDESTINATION_PT_CODE` are now factors.
write_rds(OD, "data/rds/odbus_combined.rds")odbus_combined <- readRDS("data/rds/odbus_combined.rds")Commuting Flow Data
od <- odbus_combined %>%
group_by(ORIGIN_PT_CODE, DESTINATION_PT_CODE, DAY_TYPE ,TIME_PER_HOUR) %>%
summarise(TRIPS = sum(TOTAL_TRIPS)) %>%
ungroup()Geospatial Data Wrangling
Now we need to convert the od data from aspatial to geospatial data.
First, we populate the hexagon grid indexes of hex_grid_bounded2 sf data frame into BusStop sf data frame using st_intersection().
BusStop_hex <- st_intersection(BusStop, hex_grid_bounded2) %>%
select(BusStopCode, index) %>%
st_drop_geometry()
cols_to_convert <- c("BusStopCode")
BusStop_hex[cols_to_convert] <- lapply(BusStop_hex[cols_to_convert], as.factor)
glimpse(BusStop_hex)Rows: 5,103
Columns: 2
$ BusStopCode <fct> 25059, 26379, 25751, 25761, 25711, 25719, 26369, 25741, 26…
$ index <fct> 29, 31, 39, 39, 40, 40, 41, 49, 50, 50, 51, 52, 52, 59, 60…
Append the hexagon grid index from BusStop_hex data frame to the segregated od data frames for both ORIGIN_PT_CODE and DESTINATION_PT_CODE.
od_data <- left_join(od, BusStop_hex,
by = c("ORIGIN_PT_CODE" = "BusStopCode")) %>%
rename("ORIGIN_hex" = "index")
od_data <- left_join(od_data, BusStop_hex,
by = c("DESTINATION_PT_CODE" = "BusStopCode")) %>%
rename("DESTIN_hex" = "index") %>%
drop_na() %>%
group_by(ORIGIN_hex, DESTIN_hex, DAY_TYPE, TIME_PER_HOUR) %>%
summarise(TOTAL_TRIPS = sum(TRIPS))
cols_to_convert <- c("ORIGIN_hex", "DESTIN_hex")
od_data[cols_to_convert] <- lapply(od_data[cols_to_convert], as.factor)
write_rds(od_data, "data/rds/od_data.rds")Check for duplicates
duplicate <- od_data %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
DT::datatable(duplicate)No duplicates found.
Visualisation of O-D Flows
Desire lines can be used to visualise the OD matrix, which are rays connecting a site to associated location points.
Remove intra-zonal flows
It is not meaningful to include trips that started and ended from the same point, so we will only include point pairs where the origin and destination of a trip are different.
od_plot <- od_data[od_data$ORIGIN_hex!=od_data$DESTIN_hex,]Create desire lines
Use od2line() of stplanr package to create the desire lines.
flowLine <- od2line(flow = od_plot,
zones = hex_grid_bounded2,
zone_code = "index")write_rds(flowLine, "data/rds/flowLine.rds")flowLine <- read_rds("data/rds/flowLine.rds")Visualise desire lines
Use tmap() to visualise the resulting desire lines. For a clearer visualisation, only desire lines with at least 5000 trips are shown.
The traffic density by desire lines may vary across time. Thus, the visualisations will be more intuitive if we further segment the data by hour.
Weekday
tm_shape(hex_grid_bounded2) +
tm_fill(col = "busstop_count",
palette = "Blues",
style = "cont",
title = "Bus Stop Density") +
tm_borders(col = "grey") +
flowLine %>%
filter(DAY_TYPE == "WEEKDAY", TOTAL_TRIPS >= 5000, TIME_PER_HOUR == 6) %>%
tm_shape() +
tm_lines(lwd = "TOTAL_TRIPS",
style = "fixed",
scale = c(1,2,3,4,5,7,9),
n = 6,
alpha = 0.5,
title.lwd = "Total Trips") +
tm_layout(main.title = "Desire Lines with at least 5000 trips \nbetween Traffic Analysis Zones for Weekday 6am (Nov '23 - Jan '24)",
main.title.position = "center",
main.title.size = 0.6,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 
tm_shape(hex_grid_bounded2) +
tm_fill(col = "busstop_count",
palette = "Blues",
style = "cont",
title = "Bus Stop Density") +
tm_borders(col = "grey") +
flowLine %>%
filter(DAY_TYPE == "WEEKDAY", TOTAL_TRIPS >= 5000, TIME_PER_HOUR == 12) %>%
tm_shape() +
tm_lines(lwd = "TOTAL_TRIPS",
style = "fixed",
scale = c(1,2,3,4,5,7,9),
n = 6,
alpha = 0.5,
title.lwd = "Total Trips") +
tm_layout(main.title = "Desire Lines with at least 5000 trips \nbetween Traffic Analysis Zones for Weekday 12nn (Nov '23 - Jan '24)",
main.title.position = "center",
main.title.size = 0.6,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 
tm_shape(hex_grid_bounded2) +
tm_fill(col = "busstop_count",
palette = "Blues",
style = "cont",
title = "Bus Stop Density") +
tm_borders(col = "grey") +
flowLine %>%
filter(DAY_TYPE == "WEEKDAY", TOTAL_TRIPS >= 5000, TIME_PER_HOUR == 18) %>%
tm_shape() +
tm_lines(lwd = "TOTAL_TRIPS",
style = "fixed",
scale = c(1,2,3,4,5,7,9),
n = 6,
alpha = 0.5,
title.lwd = "Total Trips") +
tm_layout(main.title = "Desire Lines with at least 5000 trips \nbetween Traffic Analysis Zones for Weekday 6pm (Nov '23 - Jan '24)",
main.title.position = "center",
main.title.size = 0.6,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 
tm_shape(hex_grid_bounded2) +
tm_fill(col = "busstop_count",
palette = "Blues",
style = "cont",
title = "Bus Stop Density") +
tm_borders(col = "grey") +
flowLine %>%
filter(DAY_TYPE == "WEEKDAY", TOTAL_TRIPS >= 5000, TIME_PER_HOUR == 0) %>%
tm_shape() +
tm_lines(lwd = "TOTAL_TRIPS",
style = "fixed",
scale = c(1,2,3,4,5,7,9),
n = 6,
alpha = 0.5,
title.lwd = "Total Trips") +
tm_layout(main.title = "Desire Lines with at least 5000 trips \nbetween Traffic Analysis Zones for Weekday 12am (Nov '23 - Jan '24)",
main.title.position = "center",
main.title.size = 0.6,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 
- It is logical that there is less bus traffic in the weehours since most of the population is either asleep or not working at this hour.
Weekend/Holiday
tm_shape(hex_grid_bounded2) +
tm_fill(col = "busstop_count",
palette = "Blues",
style = "cont",
title = "Bus Stop Density") +
tm_borders(col = "grey") +
flowLine %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY",TOTAL_TRIPS >= 5000, TIME_PER_HOUR == 6) %>%
tm_shape() +
tm_lines(lwd = "TOTAL_TRIPS",
style = "fixed",
scale = c(1,2,3,4,5,7,9),
n = 6,
alpha = 0.5,
title.lwd = "Total Trips") +
tm_layout(main.title = "Desire Lines with at least 5000 trips \nbetween Traffic Analysis Zones for Weekday 6am (Nov '23 - Jan '24)",
main.title.position = "center",
main.title.size = 0.6,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 
tm_shape(hex_grid_bounded2) +
tm_fill(col = "busstop_count",
palette = "Blues",
style = "cont",
title = "Bus Stop Density") +
tm_borders(col = "grey") +
flowLine %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY", TOTAL_TRIPS >= 5000, TIME_PER_HOUR == 12) %>%
tm_shape() +
tm_lines(lwd = "TOTAL_TRIPS",
style = "fixed",
scale = c(1,2,3,4,5,7,9),
n = 6,
alpha = 0.5,
title.lwd = "Total Trips") +
tm_layout(main.title = "Desire Lines with at least 5000 trips \nbetween Traffic Analysis Zones for Weekday 12nn (Nov '23 - Jan '24)",
main.title.position = "center",
main.title.size = 0.6,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 
tm_shape(hex_grid_bounded2) +
tm_fill(col = "busstop_count",
palette = "Blues",
style = "cont",
title = "Bus Stop Density") +
tm_borders(col = "grey") +
flowLine %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY", TOTAL_TRIPS >= 5000, TIME_PER_HOUR == 18) %>%
tm_shape() +
tm_lines(lwd = "TOTAL_TRIPS",
style = "fixed",
scale = c(1,2,3,4,5,7,9),
n = 6,
alpha = 0.5,
title.lwd = "Total Trips") +
tm_layout(main.title = "Desire Lines with at least 5000 trips \nbetween Traffic Analysis Zones for Weekend 6pm (Nov '23 - Jan '24)",
main.title.position = "center",
main.title.size = 0.6,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 
We can also focus on individual planning areas to observe more commuting trends within the planning area.
tmap_mode("view")
hex_grid_bounded2 %>%
filter(PLN_AREA_N == "JURONG EAST") %>%
tm_shape() +
tm_fill(col = "busstop_count",
palette = "Greens",
style = "cont",
title = "Bus Stop Density",
popup.vars = c("SUBZONE_N"),
alpha = 0.2) +
tm_view(set.zoom.limits = c(11,14)) +
tm_borders(col = "grey") +
flowLine %>%
filter(TOTAL_TRIPS >= 5000) %>%
tm_shape() +
tm_lines(lwd = "TOTAL_TRIPS",
style = "fixed",
scale = c(1,2,3,4,5,7,9),
n = 6,
alpha = 0.5,
popup.vars = c("TOTAL_TRIPS"))tmap_mode("plot")Computing Distance Matrix
mpsz_sp <- as(mpsz_sf, "Spatial")dist <- spDists(mpsz_sp,
longlat = FALSE)head(dist, n=c(10,10))Label column and row headers of distance matrix
Create a list sorted by planning subzone.
sz_names <- mpsz_sf$SUBZONE_NAttach the subzones to row and column for distance matrix matching.
colnames(dist) <- paste0(sz_names)
rownames(dist) <- paste0(sz_names)Pivoting distance value by SUBZONE_N
Pivot the distance matrix into a long table by using the row and column subzone names.
distPair <- melt(dist) %>%
rename(dist = value)
head(distPair, 10) Var1 Var2 dist
1 MARINA EAST MARINA EAST 0.00000000
2 INSTITUTION HILL MARINA EAST 0.03528227
3 ROBERTSON QUAY MARINA EAST 0.03539590
4 JURONG ISLAND AND BUKOM MARINA EAST 0.18199800
5 FORT CANNING MARINA EAST 0.02687355
6 MARINA EAST (MP) MARINA EAST 0.01286792
7 SUDONG MARINA EAST 0.17287059
8 SEMAKAU MARINA EAST 0.13479457
9 SOUTHERN GROUP MARINA EAST 0.06792530
10 SENTOSA MARINA EAST 0.05759682
- Notice that from the first row, the within zone distance is 0.
Updating intra-zonal distances
Append a constance value to replace the intra-zonal distance of 0.
distPair %>%
filter(dist > 0) %>%
summary() Var1 Var2
MARINA EAST : 331 MARINA EAST : 331
INSTITUTION HILL : 331 INSTITUTION HILL : 331
ROBERTSON QUAY : 331 ROBERTSON QUAY : 331
JURONG ISLAND AND BUKOM: 331 JURONG ISLAND AND BUKOM: 331
FORT CANNING : 331 FORT CANNING : 331
MARINA EAST (MP) : 331 MARINA EAST (MP) : 331
(Other) :107906 (Other) :107906
dist
Min. :0.001561
1st Qu.:0.064419
Median :0.107153
Mean :0.110151
3rd Qu.:0.147771
Max. :0.448634
Spatial Push and Pull Factors
Push factors are reasons for pushing people or passengers away from their location. Pull factors are reasons for attracting passenger to a location.
Population density of an area has an impact on movement patterns as higher population density areas can act as a propulsive force. The HDB data will be used as a proxy for the population density, where a greater number of units will indicate a higher population density.
Employment opportunities density of an area can also contribute a location’s push or pull. An area with more businesses will attract more workers to the area and thus the registered business data will be used as a proxy.
School Density can determine the volume of passengers commuting to an area as an area with more schools would attract more human traffic whilst an area with fewer or no schools will not attract students especially during school hours.
Financial Services Density can contribute to the overall attractiveness of a destination of a destination for employment by offering convenience.
Public Healthcare Density can reflect the utility of polyclinics and public hospitals through the traffic network.
Population Density
Importing Aspatial HDB data
Use read_csv() from the readr package to import the prepared HDB csv data.
hdb <- read_csv("data/aspatial/hdb.csv")
glimpse(hdb)Rows: 12,442
Columns: 37
$ ...1 <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14…
$ blk_no <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"…
$ street <chr> "BEACH RD", "BEDOK STH AVE 1", "CANTONMENT RD", …
$ max_floor_lvl <dbl> 16, 14, 2, 15, 4, 25, 12, 14, 12, 2, 15, 15, 13,…
$ year_completed <dbl> 1970, 1975, 2010, 1982, 1975, 1982, 1975, 1977, …
$ residential <chr> "Y", "Y", "N", "Y", "Y", "Y", "Y", "Y", "Y", "N"…
$ commercial <chr> "Y", "N", "Y", "N", "Y", "N", "N", "N", "Y", "Y"…
$ market_hawker <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N"…
$ miscellaneous <chr> "N", "Y", "N", "N", "N", "N", "Y", "Y", "N", "N"…
$ multistorey_carpark <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N"…
$ precinct_pavilion <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N"…
$ bldg_contract_town <chr> "KWN", "BD", "CT", "BD", "PRC", "BM", "QT", "GL"…
$ total_dwelling_units <dbl> 142, 206, 0, 102, 55, 96, 125, 247, 95, 0, 220, …
$ `1room_sold` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ `2room_sold` <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ `3room_sold` <dbl> 138, 204, 0, 0, 54, 0, 118, 0, 62, 0, 216, 214, …
$ `4room_sold` <dbl> 1, 0, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ `5room_sold` <dbl> 2, 2, 0, 92, 1, 96, 7, 0, 33, 0, 4, 5, 0, 4, 0, …
$ exec_sold <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ multigen_sold <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ studio_apartment_sold <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ `1room_rental` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 319, 0, 0, 0…
$ `2room_rental` <dbl> 0, 0, 0, 0, 0, 0, 0, 247, 0, 0, 0, 0, 0, 0, 56, …
$ `3room_rental` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 14, 1,…
$ other_room_rental <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 14, 0,…
$ lat <dbl> 1.295097, 1.320852, 1.275488, 1.327969, 1.388610…
$ lng <dbl> 103.8541, 103.9337, 103.8414, 103.9227, 103.9881…
$ building <chr> "RAFFLES HOTEL", "NIL", "PINNACLE @ DUXTON", "PI…
$ addr <chr> "1 BEACH ROAD RAFFLES HOTEL SINGAPORE 189673", "…
$ postal <chr> "189673", "460001", "080001", "461001", "500001"…
$ SUBZONE_NO <dbl> 2, 6, 3, 3, 1, 9, 10, 5, 3, 5, 1, 5, 2, 2, 1, 7,…
$ SUBZONE_N <chr> "CITY HALL", "BEDOK SOUTH", "CHINATOWN", "KEMBAN…
$ SUBZONE_C <chr> "DTSZ02", "BDSZ06", "OTSZ03", "BDSZ03", "CHSZ01"…
$ PLN_AREA_N <chr> "DOWNTOWN CORE", "BEDOK", "OUTRAM", "BEDOK", "CH…
$ PLN_AREA_C <chr> "DT", "BD", "OT", "BD", "CH", "BM", "QT", "GL", …
$ REGION_N <chr> "CENTRAL REGION", "EAST REGION", "CENTRAL REGION…
$ REGION_C <chr> "CR", "ER", "CR", "ER", "ER", "CR", "CR", "CR", …
For the purpose of computing a proxy for population density, the residential units will be extracted using filter() from the dplyr package.
hdb_residential <- hdb %>%
filter(residential == "Y")
head(hdb_residential, 10)# A tibble: 10 × 37
...1 blk_no street max_floor_lvl year_completed residential commercial
<dbl> <chr> <chr> <dbl> <dbl> <chr> <chr>
1 0 1 BEACH RD 16 1970 Y Y
2 1 1 BEDOK STH A… 14 1975 Y N
3 3 1 CHAI CHEE RD 15 1982 Y N
4 4 1 CHANGI VILL… 4 1975 Y Y
5 5 1 DELTA AVE 25 1982 Y N
6 6 1 DOVER RD 12 1975 Y N
7 7 1 EUNOS CRES 14 1977 Y N
8 8 1 EVERTON PK 12 1980 Y Y
9 10 1 GHIM MOH RD 15 1975 Y N
10 11 1 HAIG RD 15 1976 Y N
# ℹ 30 more variables: market_hawker <chr>, miscellaneous <chr>,
# multistorey_carpark <chr>, precinct_pavilion <chr>,
# bldg_contract_town <chr>, total_dwelling_units <dbl>, `1room_sold` <dbl>,
# `2room_sold` <dbl>, `3room_sold` <dbl>, `4room_sold` <dbl>,
# `5room_sold` <dbl>, exec_sold <dbl>, multigen_sold <dbl>,
# studio_apartment_sold <dbl>, `1room_rental` <dbl>, `2room_rental` <dbl>,
# `3room_rental` <dbl>, other_room_rental <dbl>, lat <dbl>, lng <dbl>, …
There are also some outliers like hotels that are classified as a residential unit. We can remove rows containing ‘hotel’ using grepl().
hotels <- hdb_residential %>%
filter(grepl("HOTEL", building, ignore.case = TRUE))
kable(hotels)| …1 | blk_no | street | max_floor_lvl | year_completed | residential | commercial | market_hawker | miscellaneous | multistorey_carpark | precinct_pavilion | bldg_contract_town | total_dwelling_units | 1room_sold | 2room_sold | 3room_sold | 4room_sold | 5room_sold | exec_sold | multigen_sold | studio_apartment_sold | 1room_rental | 2room_rental | 3room_rental | other_room_rental | lat | lng | building | addr | postal | SUBZONE_NO | SUBZONE_N | SUBZONE_C | PLN_AREA_N | PLN_AREA_C | REGION_N | REGION_C |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | BEACH RD | 16 | 1970 | Y | Y | N | N | N | N | KWN | 142 | 0 | 1 | 138 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.295097 | 103.8541 | RAFFLES HOTEL | 1 BEACH ROAD RAFFLES HOTEL SINGAPORE 189673 | 189673 | 2 | CITY HALL | DTSZ02 | DOWNTOWN CORE | DT | CENTRAL REGION | CR |
| 4580 | 3 | BEACH RD | 16 | 1970 | Y | Y | N | N | N | N | KWN | 138 | 0 | 1 | 134 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.294801 | 103.8545 | RAFFLES HOTEL SINGAPORE | 3 BEACH ROAD RAFFLES HOTEL SINGAPORE SINGAPORE 189674 | 189674 | 2 | CITY HALL | DTSZ02 | DOWNTOWN CORE | DT | CENTRAL REGION | CR |
The HDB Blk 1 Beach Road shares a similar address as Raffles Hotel’s 1 Beach Road, but they have different postal codes.
To verify other similar addresses, filter for addresses containing “BEACH RD”.
beach_rd <- hdb_residential %>%
filter(grepl("BEACH RD", street, ignore.case = TRUE))
kable(beach_rd)| …1 | blk_no | street | max_floor_lvl | year_completed | residential | commercial | market_hawker | miscellaneous | multistorey_carpark | precinct_pavilion | bldg_contract_town | total_dwelling_units | 1room_sold | 2room_sold | 3room_sold | 4room_sold | 5room_sold | exec_sold | multigen_sold | studio_apartment_sold | 1room_rental | 2room_rental | 3room_rental | other_room_rental | lat | lng | building | addr | postal | SUBZONE_NO | SUBZONE_N | SUBZONE_C | PLN_AREA_N | PLN_AREA_C | REGION_N | REGION_C |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | BEACH RD | 16 | 1970 | Y | Y | N | N | N | N | KWN | 142 | 0 | 1 | 138 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.295097 | 103.8541 | RAFFLES HOTEL | 1 BEACH ROAD RAFFLES HOTEL SINGAPORE 189673 | 189673 | 2 | CITY HALL | DTSZ02 | DOWNTOWN CORE | DT | CENTRAL REGION | CR |
| 1660 | 15 | BEACH RD | 20 | 1974 | Y | Y | N | N | N | N | KWN | 76 | 0 | 0 | 0 | 76 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.295796 | 103.8555 | NIL | 15 BEACH ROAD | NIL | 1 | BUGIS | DTSZ01 | DOWNTOWN CORE | DT | CENTRAL REGION | CR |
| 2079 | 17 | BEACH RD | 20 | 1974 | Y | Y | N | N | N | N | KWN | 76 | 0 | 0 | 0 | 76 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.303689 | 103.8636 | GOLDEN BEACH VISTA | 17 BEACH ROAD GOLDEN BEACH VISTA SINGAPORE 190017 | 190017 | 9 | CRAWFORD | KLSZ09 | KALLANG | KL | CENTRAL REGION | CR |
| 2567 | 2 | BEACH RD | 16 | 1970 | Y | Y | N | N | N | N | KWN | 139 | 0 | 1 | 136 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.390462 | 103.9753 | CHANGI BEACH CLUB | 2 ANDOVER ROAD CHANGI BEACH CLUB SINGAPORE 509984 | 509984 | 1 | CHANGI POINT | CHSZ01 | CHANGI | CH | EAST REGION | ER |
| 4580 | 3 | BEACH RD | 16 | 1970 | Y | Y | N | N | N | N | KWN | 138 | 0 | 1 | 134 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1.294801 | 103.8545 | RAFFLES HOTEL SINGAPORE | 3 BEACH ROAD RAFFLES HOTEL SINGAPORE SINGAPORE 189674 | 189674 | 2 | CITY HALL | DTSZ02 | DOWNTOWN CORE | DT | CENTRAL REGION | CR |
| 6028 | 4 | BEACH RD | 16 | 1968 | Y | N | N | Y | N | N | KWN | 336 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 336 | 0 | 0 | 0 | 1.304716 | 103.8652 | NIL | 4 BEACH ROAD SINGAPORE 190004 | 190004 | 9 | CRAWFORD | KLSZ09 | KALLANG | KL | CENTRAL REGION | CR |
| 7743 | 5 | BEACH RD | 16 | 1968 | Y | N | N | Y | N | N | KWN | 336 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 336 | 0 | 0 | 0 | 1.298092 | 103.8569 | BEACH ROAD CONSERVATION AREA | 5 TAN QUEE LAN STREET BEACH ROAD CONSERVATION AREA SINGAPORE 188094 | 188094 | 1 | BUGIS | DTSZ01 | DOWNTOWN CORE | DT | CENTRAL REGION | CR |
| 8956 | 6 | BEACH RD | 16 | 1968 | Y | Y | N | N | N | N | KWN | 198 | 0 | 45 | 1 | 28 | 0 | 0 | 0 | 0 | 57 | 67 | 0 | 0 | 1.303992 | 103.8644 | BEACH ROAD GARDENS | 6 BEACH ROAD BEACH ROAD GARDENS SINGAPORE 190006 | 190006 | 9 | CRAWFORD | KLSZ09 | KALLANG | KL | CENTRAL REGION | CR |
2, 5 and 15 Beach Road do not have the correct postal codes following the 1900XX convention. Additionally, these addresses do not have the correct coordinates too.
With reference to URA’s official asset map of Singapore, OneMap, the data will be manually modified using mutate() and ifelse().
hdb_residential2 <- hdb_residential %>%
mutate(postal = ifelse(blk_no == 1 & street == "BEACH RD", 190001, postal)) %>%
mutate(lat = ifelse(blk_no == 1 & street == "BEACH RD", 1.3036714, lat)) %>%
mutate(lng = ifelse(blk_no == 1 & street == "BEACH RD", 103.8644787, lng)) %>%
mutate(postal = ifelse(blk_no == 2 & street == "BEACH RD", 190002, postal)) %>%
mutate(lat = ifelse(blk_no == 2 & street == "BEACH RD", 1.3040331, lat)) %>%
mutate(lng = ifelse(blk_no == 2 & street == "BEACH RD", 103.8649285, lng)) %>%
mutate(postal = ifelse(blk_no == 3 & street == "BEACH RD", 190003, postal)) %>%
mutate(lat = ifelse(blk_no == 3 & street == "BEACH RD", 1.3041872, lat)) %>%
mutate(lng = ifelse(blk_no == 3 & street == "BEACH RD", 103.8651934, lng)) %>%
mutate(postal = ifelse(blk_no == 5 & street == "BEACH RD", 190005, postal)) %>%
mutate(lat = ifelse(blk_no == 5 & street == "BEACH RD", 1.3043463, lat)) %>%
mutate(lng = ifelse(blk_no == 5 & street == "BEACH RD", 103.8648158, lng)) %>%
mutate(postal = ifelse(blk_no == 15 & street == "BEACH RD", 190015, postal)) %>%
mutate(lat = ifelse(blk_no == 15 & street == "BEACH RD", 1.3034254, lat)) %>%
mutate(lng = ifelse(blk_no == 15 & street == "BEACH RD", 103.8631535, lng))Check for any duplicates.
duplicate <- hdb_residential2 %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
DT::datatable(duplicate)Converting Aspatial data to Geospatial data
Longitude and latitude values are in decimal degrees and thus the data is in wgs84 geographic coordinate system.
To convert hdb_residential2 into sf, we use st_as_sf() and set the crs argument to 4326 first. The transformation to Singapore’s coordinate reference system 3414 will be done with st_transform().
We only need the postal code, total dwelling units and geometry attributes so we will use the select() function to extract these columns.
hdb_residential_sf <- st_as_sf(hdb_residential2,
coords = c("lng", "lat"),
crs=4326) %>%
st_transform(crs = 3414) %>%
select(postal, total_dwelling_units, geometry)
str(hdb_residential_sf)sf [10,181 × 3] (S3: sf/tbl_df/tbl/data.frame)
$ postal : chr [1:10181] "190001" "460001" "461001" "500001" ...
$ total_dwelling_units: num [1:10181] 142 206 102 55 96 125 247 95 220 219 ...
$ geometry :sfc_POINT of length 10181; first list element: 'XY' num [1:2] 31468 31779
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA
..- attr(*, "names")= chr [1:2] "postal" "total_dwelling_units"
Let’s build the HDB residential population density map.

Performing in-polygon count
housing_count <- st_join(hex_grid_bounded2, hdb_residential_sf,
join = st_intersects, left = TRUE) %>%
st_drop_geometry() %>%
group_by(index) %>%
summarise(housing_count = sum(total_dwelling_units)) %>%
ungroup() %>%
mutate(housing_count = ifelse(is.na(housing_count), 0, housing_count))hex_grid_bounded3 <- left_join(hex_grid_bounded2, housing_count,
by = c("index" = "index"))
summary(hex_grid_bounded3$housing_count) Min. 1st Qu. Median Mean 3rd Qu. Max.
0 0 0 652 0 7879

- Housing density is greatest in the northeast, particularly the newer town areas like Punggol.
Employment Opportunities Density
Import Geospatial data: Business
biz <- st_read("data/geospatial/Business.shp") %>% st_transform(crs=3414)Reading layer `Business' from data source
`C:\guacodemoleh\IS415-GAA\Take-home_Ex\Take-home_Ex03\data\geospatial\Business.shp'
using driver `ESRI Shapefile'
Simple feature collection with 6550 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3669.148 ymin: 25408.41 xmax: 47034.83 ymax: 50148.54
Projected CRS: SVY21 / Singapore TM

Businesses are concentrated in the central and west regions.
Perform point-in-polygon count
hex_grid_bounded3$biz_count <- lengths(st_intersects(hex_grid_bounded3, biz))
summary(hex_grid_bounded3$biz_count) Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 0.000 0.000 3.915 2.000 106.000
tm_shape(hex_grid_bounded3) +
tm_fill(col = "biz_count",
palette = "Blues",
style = "cont",
title = "Business Density") +
tm_borders(col = "grey") +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)
tmap_mode("plot")Schools Density
schools <- read_csv("data/aspatial/schoolsclean.csv")schools <- schools %>%
separate(latlong, into = c("latitude", "longitude"), sep = ",", convert = TRUE)
schools_sf <- st_as_sf(schools, coords = c("longitude","latitude"), crs = 4326) %>%
st_transform(crs=3414)tm_shape(hex_grid_bounded3) +
tm_fill(col = "busstop_count",
palette = "Blues",
style = "cont",
title = "Bus Stop Density") +
tm_borders(col = "grey") +
tm_shape(schools_sf) +
tm_dots() +
tm_layout(main.title = "Location of Schools",
main.title.position = "center",
main.title.size = 1,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)
Perform point-in-polygon count
hex_grid_bounded3$school_count <- lengths(st_intersects(hex_grid_bounded3, schools_sf))
summary(hex_grid_bounded3$school_count) Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.0000 0.0000 0.1943 0.0000 4.0000
tm_shape(hex_grid_bounded3) +
tm_fill(col = "school_count",
palette = "Blues",
style = "cont",
title = "School Density") +
tm_borders(col = "grey") +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)
Financial Services Density
Importing Geospatial data: Financial Services
FinServ <- st_read(dsn = "data/geospatial", layer = "FinServ") %>%
st_transform(crs = 3414)Reading layer `FinServ' from data source
`C:\guacodemoleh\IS415-GAA\Take-home_Ex\Take-home_Ex03\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 3320 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 4881.527 ymin: 25171.88 xmax: 46526.16 ymax: 49338.02
Projected CRS: SVY21 / Singapore TM
tm_shape(hex_grid_bounded3) +
tm_fill(col = "busstop_count",
palette = "Blues",
style = "cont",
title = "Bus Stop Density") +
tm_borders(col = "grey") +
tm_shape(FinServ) +
tm_dots() +
tm_layout(main.title = "Location of Financial Services",
main.title.position = "center",
main.title.size = 1,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)
Perform point-in-polygon count
hex_grid_bounded3$fin_count <- lengths(st_intersects(hex_grid_bounded3, FinServ))
summary(hex_grid_bounded3$fin_count) Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 0.000 0.000 1.984 1.000 117.000
tm_shape(hex_grid_bounded3) +
tm_fill(col = "fin_count",
palette = "Blues",
style = "cont",
title = "Financial Services Density") +
tm_borders(col = "grey") +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)
Public Healthcare Density
public_hc <- read_csv("data/aspatial/HospitalsPolyclinics v_2024.csv")public_hc.sf <- st_as_sf(public_hc[1:42,], wkt = "geometry", crs = 4326) %>%
st_transform(crs=3414)
public_hc2.sf <- st_as_sf(public_hc[43:1235,], wkt = "geometry", crs = 3414) # CHAS clinics encoded in EPSG 3414
public_hc.sf <- rbind(public_hc.sf, public_hc2.sf)
#write_rds(public_hc.sf, "data/geospatial/public_hc.sf")Perform point-in-polygon count
hex_grid_bounded3$hc_count <- lengths(st_intersects(hex_grid_bounded3, public_hc.sf))
summary(hex_grid_bounded3$hc_count) Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.0000 0.0000 0.7376 0.0000 17.0000
tm_shape(hex_grid_bounded3) +
tm_fill(col = "busstop_count",
palette = "Blues",
style = "cont",
title = "Bus Stop Density") +
tm_borders(col = "grey") +
tm_shape(public_hc.sf) +
tm_dots() +
tm_layout(main.title = "Location of Affordable Healthcare Services",
main.title.position = "center",
main.title.size = 1,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)
tm_shape(hex_grid_bounded3) +
tm_fill(col = "hc_count",
palette = "Blues",
style = "cont",
title = "Public Healthcare Services Density") +
tm_borders(col = "grey") +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)
Leisure & Recreation Density
leisure_recre <- st_read("data/geospatial/Liesure&Recreation.shp") %>% st_transform(crs=3414)Reading layer `Liesure&Recreation' from data source
`C:\guacodemoleh\IS415-GAA\Take-home_Ex\Take-home_Ex03\data\geospatial\Liesure&Recreation.shp'
using driver `ESRI Shapefile'
Simple feature collection with 1217 features and 30 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 6010.495 ymin: 25134.28 xmax: 48439.77 ymax: 50078.88
Projected CRS: SVY21 / Singapore TM
tm_shape(hex_grid_bounded3) +
tm_fill(col = "busstop_count",
palette = "Blues",
style = "cont",
title = "Bus Stop Density") +
tm_borders(col = "grey") +
tm_shape(leisure_recre) +
tm_dots() +
tm_layout(main.title = "Location of Leisure & Recreation",
main.title.position = "center",
main.title.size = 1,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)
Perform point-in-polygon count
hex_grid_bounded3$leisure_recre_count <- lengths(st_intersects(hex_grid_bounded3, leisure_recre))
summary(hex_grid_bounded3$leisure_recre_count) Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.0000 0.0000 0.7256 1.0000 27.0000
tm_shape(hex_grid_bounded3) +
tm_fill(col = "leisure_recre_count",
palette = "Blues",
style = "cont",
title = "Leisure & Recreational Density") +
tm_borders(col = "grey") +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)
Food & Beverage (F&B)
food_bev <- st_read("data/geospatial/F&B.shp") %>% st_transform(crs=3414)Reading layer `F&B' from data source
`C:\guacodemoleh\IS415-GAA\Take-home_Ex\Take-home_Ex03\data\geospatial\F&B.shp'
using driver `ESRI Shapefile'
Simple feature collection with 1919 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 6010.495 ymin: 25343.27 xmax: 45462.43 ymax: 48796.21
Projected CRS: SVY21 / Singapore TM
tm_shape(hex_grid_bounded3) +
tm_fill(col = "busstop_count",
palette = "Blues",
style = "cont",
title = "Bus Stop Density") +
tm_borders(col = "grey") +
tm_shape(food_bev) +
tm_dots() +
tm_layout(main.title = "Location of Food & Beverage",
main.title.position = "center",
main.title.size = 1,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)
Perform point-in-polygon count
hex_grid_bounded3$food_bev_count <- lengths(st_intersects(hex_grid_bounded3, food_bev))
summary(hex_grid_bounded3$food_bev_count) Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 0.000 0.000 1.147 0.000 113.000
tm_shape(hex_grid_bounded3) +
tm_fill(col = "food_bev_count",
palette = "Blues",
style = "cont",
title = "Food & Beverage Density") +
tm_borders(col = "grey") +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)
Retail
retail <- st_read("data/geospatial/Retails.shp") %>% st_transform(crs=3414)Reading layer `Retails' from data source
`C:\guacodemoleh\IS415-GAA\Take-home_Ex\Take-home_Ex03\data\geospatial\Retails.shp'
using driver `ESRI Shapefile'
Simple feature collection with 37635 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 4737.982 ymin: 25171.88 xmax: 48265.04 ymax: 50135.28
Projected CRS: SVY21 / Singapore TM
tm_shape(hex_grid_bounded3) +
tm_fill(col = "busstop_count",
palette = "Blues",
style = "cont",
title = "Bus Stop Density") +
tm_borders(col = "grey") +
tm_shape(retail) +
tm_dots() +
tm_layout(main.title = "Location of Retail",
main.title.position = "center",
main.title.size = 1,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)
Perform point-in-polygon count
hex_grid_bounded3$retail_count <- lengths(st_intersects(hex_grid_bounded3, retail))
summary(hex_grid_bounded3$retail_count) Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0 0.0 0.0 22.5 10.0 1758.0
tm_shape(hex_grid_bounded3) +
tm_fill(col = "retail_count",
palette = "Blues",
style = "cont",
title = "Retail Density") +
tm_borders(col = "grey") +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)
Entertainment
entertn <- st_read("data/geospatial/entertn.shp") %>% st_transform(crs=3414)Reading layer `entertn' from data source
`C:\guacodemoleh\IS415-GAA\Take-home_Ex\Take-home_Ex03\data\geospatial\entertn.shp'
using driver `ESRI Shapefile'
Simple feature collection with 114 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 10809.34 ymin: 26528.63 xmax: 41600.62 ymax: 46375.77
Projected CRS: SVY21 / Singapore TM
tm_shape(hex_grid_bounded3) +
tm_fill(col = "busstop_count",
palette = "Blues",
style = "cont",
title = "Bus Stop Density") +
tm_borders(col = "grey") +
tm_shape(entertn) +
tm_dots() +
tm_layout(main.title = "Location of Entertainment",
main.title.position = "center",
main.title.size = 1,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)
Perform point-in-polygon count
hex_grid_bounded3$entertn_count <- lengths(st_intersects(hex_grid_bounded3, entertn))
summary(hex_grid_bounded3$entertn_count) Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00000 0.00000 0.00000 0.06814 0.00000 11.00000
tm_shape(hex_grid_bounded3) +
tm_fill(col = "entertn_count",
palette = "Blues",
style = "cont",
title = "Entertainment Density") +
tm_borders(col = "grey") +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4)
Prepare Origin & Destination Variables
Origin
propulsive <- hex_grid_bounded3 %>%
st_drop_geometry() %>%
select(index, biz_count, school_count, fin_count, hc_count, busstop_count, housing_count, leisure_recre_count, retail_count, entertn_count, food_bev_count)
origin <- names(propulsive) %>%
modify_at(-1, ~ paste0("o_", .)) # Add prefix to all except index
# Assign modified names back to the data frame
names(propulsive) <- originDestination
attractiveness <- hex_grid_bounded3 %>%
st_drop_geometry() %>%
select(index, biz_count, school_count, fin_count, hc_count, busstop_count, housing_count, leisure_recre_count, retail_count, entertn_count, food_bev_count)
destin <- names(propulsive) %>%
modify_at(-1, ~ paste0("d_", .)) # Add prefix to all except index
# Assign modified names back to the data frame
names(attractiveness) <- destinCompute Distance Matrix
A distance matrix shows the distance between pairs of locations. A location’s distance from itselfis shown in the main diagonal of a distance matrix table.
In view of time, we will use sp rather than sf.
hex_grid_sp <- as(hex_grid_bounded2, "Spatial")dist <- spDists(hex_grid_sp, longlat = FALSE)
# resultant matrix
head(dist, n=c(10, 10)) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0.000 1299.038 2598.076 3897.114 5196.152 750.000 750.000 1984.313
[2,] 1299.038 0.000 1299.038 2598.076 3897.114 1984.313 750.000 750.000
[3,] 2598.076 1299.038 0.000 1299.038 2598.076 3269.174 1984.313 750.000
[4,] 3897.114 2598.076 1299.038 0.000 1299.038 4562.072 3269.174 1984.313
[5,] 5196.152 3897.114 2598.076 1299.038 0.000 5857.687 4562.072 3269.174
[6,] 750.000 1984.313 3269.174 4562.072 5857.687 0.000 1299.038 2598.076
[7,] 750.000 750.000 1984.313 3269.174 4562.072 1299.038 0.000 1299.038
[8,] 1984.313 750.000 750.000 1984.313 3269.174 2598.076 1299.038 0.000
[9,] 3269.174 1984.313 750.000 750.000 1984.313 3897.114 2598.076 1299.038
[10,] 4562.072 3269.174 1984.313 750.000 750.000 5196.152 3897.114 2598.076
[,9] [,10]
[1,] 3269.174 4562.072
[2,] 1984.313 3269.174
[3,] 750.000 1984.313
[4,] 750.000 750.000
[5,] 1984.313 750.000
[6,] 3897.114 5196.152
[7,] 2598.076 3897.114
[8,] 1299.038 2598.076
[9,] 0.000 1299.038
[10,] 1299.038 0.000
The result is a matrix object class and the column and row headers are not labeled with the hexagon grid index representing the TAZ.
Label column and row headers
Create a list sorted according to the distance matrix by Traffic Analytical Zones.
hex_names <- hex_grid_bounded2$indexAttach the hexagon grid index to the rows and columns for distance matrix matching.
colnames(dist) <- paste0(hex_names)
rownames(dist) <- paste0(hex_names)Pivot distance value by hexagon grid index
The distance matrix is pivoted into a long table by using the melt() function of reshape2.
distPair <- reshape2::melt(dist) %>%
rename(dist = value)
head(distPair, 10) Var1 Var2 dist
1 1 1 0.000
2 2 1 1299.038
3 3 1 2598.076
4 4 1 3897.114
5 5 1 5196.152
6 6 1 750.000
7 7 1 750.000
8 8 1 1984.313
9 9 1 3269.174
10 10 1 4562.072
- The intra-zonal distances are 0.
Update intra-zonal distances
Check for the lowest non-zero distance value.
distPair %>%
filter(dist > 0) %>%
summary() Var1 Var2 dist
Min. : 1 Min. : 1 Min. : 750
1st Qu.: 419 1st Qu.: 419 1st Qu.: 9808
Median : 837 Median : 837 Median :15660
Mean : 837 Mean : 837 Mean :16748
3rd Qu.:1255 3rd Qu.:1255 3rd Qu.:22362
Max. :1673 Max. :1673 Max. :54750
Since the lowest is 750m, any distance less than 750m can represent the intra-zonal distance. For consistency, 375m is appended to intra-zonal distances.
distPair$dist <- ifelse(distPair$dist == 0,
375, distPair$dist)
summary(distPair) Var1 Var2 dist
Min. : 1 Min. : 1 Min. : 375
1st Qu.: 419 1st Qu.: 419 1st Qu.: 9808
Median : 837 Median : 837 Median :15660
Mean : 837 Mean : 837 Mean :16738
3rd Qu.:1255 3rd Qu.:1255 3rd Qu.:22362
Max. :1673 Max. :1673 Max. :54750
Rename the origin and destination fields and convert into factor data type.
distPair <- distPair %>%
rename(orig = Var1,
dest = Var2) %>%
mutate(across(c(orig, dest), as.factor))Separating intra-flow from passenger volume
A new column FlowNoIntra is created to differentiate intra-zone trips from inter-zone trips based on the comparison of origin and destination zones.
od_data$FlowNoIntra <- ifelse(
od_data$ORIGIN_hex == od_data$DESTIN_hex, 0, od_data$TOTAL_TRIPS)
od_data$offset <- ifelse(
od_data$ORIGIN_hex == od_data$DESTIN_hex, 0.000001, 1)od_data <- od_data%>%
filter(FlowNoIntra > 0)Combining passenger volume data with distance value
Convert the data value type into factor for the origin and destination hex indexes.
od_data$ORIGIN_hex <- as.factor(od_data$ORIGIN_hex)
od_data$DESTIN_hex <- as.factor(od_data$DESTIN_hex)Then perform a left join to join the OD data and distPair. Store it under flow_data.
flow_data <- od_data %>%
left_join (distPair,
by = c("ORIGIN_hex" = "orig",
"DESTIN_hex" = "dest"))Preparing origin attributes
flow_data <- flow_data %>%
left_join (propulsive,
by = c("ORIGIN_hex" = "index"))Preparing destination attributes
flow_data <- flow_data %>%
left_join (attractiveness,
by = c("DESTIN_hex" = "index"))Check for variables with zero values
For log transformation, log 0 is undefined so it is critical to ensure there are no zero values in the explanatory variables.
ORIGIN_hex DESTIN_hex DAY_TYPE TIME_PER_HOUR
1126 : 9528 1126 : 9225 Length:1809270 Min. : 0.00
1349 : 9207 695 : 9052 Class :character 1st Qu.:10.00
1258 : 9110 1349 : 8999 Mode :character Median :14.00
1107 : 8728 1258 : 8927 Mean :13.88
695 : 8700 1342 : 8811 3rd Qu.:18.00
1342 : 8693 1107 : 8667 Max. :23.00
(Other):1755304 (Other):1755589
TOTAL_TRIPS FlowNoIntra offset dist
Min. : 1.0 Min. : 1.0 Min. :1 Min. : 750
1st Qu.: 3.0 1st Qu.: 3.0 1st Qu.:1 1st Qu.: 2704
Median : 13.0 Median : 13.0 Median :1 Median : 4684
Mean : 143.9 Mean : 143.9 Mean :1 Mean : 5742
3rd Qu.: 58.0 3rd Qu.: 58.0 3rd Qu.:1 3rd Qu.: 7937
Max. :134829.0 Max. :134829.0 Max. :1 Max. :24784
o_biz_count o_school_count o_fin_count o_hc_count
Min. : 0.00 Min. :0.0000 Min. : 0.000 Min. : 0.000
1st Qu.: 0.00 1st Qu.:0.0000 1st Qu.: 1.000 1st Qu.: 0.000
Median : 1.00 Median :0.0000 Median : 3.000 Median : 1.000
Mean : 5.05 Mean :0.5816 Mean : 5.663 Mean : 2.342
3rd Qu.: 5.00 3rd Qu.:1.0000 3rd Qu.: 8.000 3rd Qu.: 4.000
Max. :99.00 Max. :4.0000 Max. :44.000 Max. :17.000
o_busstop_count o_housing_count o_leisure_recre_count o_retail_count
Min. : 1.000 Min. : 0 Min. : 0.000 Min. : 0.00
1st Qu.: 6.000 1st Qu.: 0 1st Qu.: 0.000 1st Qu.: 7.00
Median : 8.000 Median :1321 Median : 1.000 Median : 26.00
Mean : 7.831 Mean :2019 Mean : 1.536 Mean : 58.96
3rd Qu.:10.000 3rd Qu.:3761 3rd Qu.: 2.000 3rd Qu.: 75.00
Max. :20.000 Max. :7879 Max. :23.000 Max. :647.00
o_entertn_count o_food_bev_count d_o_biz_count d_o_school_count
Min. :0.000 Min. : 0.00 Min. : 0.000 Min. :0.000
1st Qu.:0.000 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.:0.000
Median :0.000 Median : 0.00 Median : 1.000 Median :0.000
Mean :0.139 Mean : 1.81 Mean : 5.055 Mean :0.581
3rd Qu.:0.000 3rd Qu.: 1.00 3rd Qu.: 5.000 3rd Qu.:1.000
Max. :5.000 Max. :110.00 Max. :99.000 Max. :4.000
d_o_fin_count d_o_hc_count d_o_busstop_count d_o_housing_count
Min. : 0.000 Min. : 0.000 Min. : 1.000 Min. : 0
1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 6.000 1st Qu.: 0
Median : 3.000 Median : 1.000 Median : 8.000 Median :1321
Mean : 5.628 Mean : 2.344 Mean : 7.831 Mean :2027
3rd Qu.: 8.000 3rd Qu.: 4.000 3rd Qu.:10.000 3rd Qu.:3793
Max. :44.000 Max. :17.000 Max. :20.000 Max. :7879
d_o_leisure_recre_count d_o_retail_count d_o_entertn_count d_o_food_bev_count
Min. : 0.000 Min. : 0.00 Min. :0.0000 Min. : 0.000
1st Qu.: 0.000 1st Qu.: 7.00 1st Qu.:0.0000 1st Qu.: 0.000
Median : 1.000 Median : 26.00 Median :0.0000 Median : 0.000
Mean : 1.524 Mean : 58.78 Mean :0.1352 Mean : 1.815
3rd Qu.: 2.000 3rd Qu.: 76.00 3rd Qu.:0.0000 3rd Qu.: 1.000
Max. :23.000 Max. :647.00 Max. :5.0000 Max. :110.000
All except the bus stop count variables contain zero values and need to be replaced with a negligible value like 0.99.
flow_data <- flow_data %>%
mutate_at(vars(ends_with("_count")), ~ ifelse(. == 0, 0.99, .))
write_rds(flow_data, "data/rds/flow_data.rds")flow_data <- read_rds("data/rds/flow_data.rds")Apply log transformation
Poisson regression is based on log, so log() has to be applied to all our explanatory variables before calibrating the various spatial interaction models.
flow_data_log <- flow_data %>%
mutate_at(vars(ends_with("_count")), log) %>%
mutate(dist = log(dist))
summary(flow_data_log) ORIGIN_hex DESTIN_hex DAY_TYPE TIME_PER_HOUR
1126 : 9528 1126 : 9225 Length:1809270 Min. : 0.00
1349 : 9207 695 : 9052 Class :character 1st Qu.:10.00
1258 : 9110 1349 : 8999 Mode :character Median :14.00
1107 : 8728 1258 : 8927 Mean :13.88
695 : 8700 1342 : 8811 3rd Qu.:18.00
1342 : 8693 1107 : 8667 Max. :23.00
(Other):1755304 (Other):1755589
TOTAL_TRIPS FlowNoIntra offset dist
Min. : 1.0 Min. : 1.0 Min. :1 Min. : 6.620
1st Qu.: 3.0 1st Qu.: 3.0 1st Qu.:1 1st Qu.: 7.903
Median : 13.0 Median : 13.0 Median :1 Median : 8.452
Mean : 143.9 Mean : 143.9 Mean :1 Mean : 8.380
3rd Qu.: 58.0 3rd Qu.: 58.0 3rd Qu.:1 3rd Qu.: 8.979
Max. :134829.0 Max. :134829.0 Max. :1 Max. :10.118
o_biz_count o_school_count o_fin_count o_hc_count
Min. :-0.01005 Min. :-0.01005 Min. :-0.01005 Min. :-0.01005
1st Qu.:-0.01005 1st Qu.:-0.01005 1st Qu.: 0.00000 1st Qu.:-0.01005
Median : 0.00000 Median :-0.01005 Median : 1.09861 Median : 0.00000
Mean : 0.78921 Mean : 0.11034 Mean : 1.19420 Mean : 0.66403
3rd Qu.: 1.60944 3rd Qu.: 0.00000 3rd Qu.: 2.07944 3rd Qu.: 1.38629
Max. : 4.59512 Max. : 1.38629 Max. : 3.78419 Max. : 2.83321
o_busstop_count o_housing_count o_leisure_recre_count o_retail_count
Min. :0.000 Min. :-0.01005 Min. :-0.01005 Min. :-0.01005
1st Qu.:1.792 1st Qu.:-0.01005 1st Qu.:-0.01005 1st Qu.: 1.94591
Median :2.079 Median : 7.18614 Median : 0.00000 Median : 3.25810
Mean :1.939 Mean : 5.15444 Mean : 0.39771 Mean : 3.08542
3rd Qu.:2.303 3rd Qu.: 8.23244 3rd Qu.: 0.69315 3rd Qu.: 4.31749
Max. :2.996 Max. : 8.97196 Max. : 3.13549 Max. : 6.47235
o_entertn_count o_food_bev_count d_o_biz_count d_o_school_count
Min. :-0.01005 Min. :-0.01005 Min. :-0.01005 Min. :-0.01005
1st Qu.:-0.01005 1st Qu.:-0.01005 1st Qu.:-0.01005 1st Qu.:-0.01005
Median :-0.01005 Median :-0.01005 Median : 0.00000 Median :-0.01005
Mean : 0.01453 Mean : 0.32800 Mean : 0.78558 Mean : 0.10923
3rd Qu.:-0.01005 3rd Qu.: 0.00000 3rd Qu.: 1.60944 3rd Qu.: 0.00000
Max. : 1.60944 Max. : 4.70048 Max. : 4.59512 Max. : 1.38629
d_o_fin_count d_o_hc_count d_o_busstop_count d_o_housing_count
Min. :-0.01005 Min. :-0.01005 Min. :0.000 Min. :-0.01005
1st Qu.:-0.01005 1st Qu.:-0.01005 1st Qu.:1.792 1st Qu.:-0.01005
Median : 1.09861 Median : 0.00000 Median :2.079 Median : 7.18614
Mean : 1.18753 Mean : 0.66415 Mean :1.937 Mean : 5.17345
3rd Qu.: 2.07944 3rd Qu.: 1.38629 3rd Qu.:2.303 3rd Qu.: 8.24091
Max. : 3.78419 Max. : 2.83321 Max. :2.996 Max. : 8.97196
d_o_leisure_recre_count d_o_retail_count d_o_entertn_count
Min. :-0.01005 Min. :-0.01005 Min. :-0.01005
1st Qu.:-0.01005 1st Qu.: 1.94591 1st Qu.:-0.01005
Median : 0.00000 Median : 3.25810 Median :-0.01005
Mean : 0.39480 Mean : 3.08702 Mean : 0.01390
3rd Qu.: 0.69315 3rd Qu.: 4.33073 3rd Qu.:-0.01005
Max. : 3.13549 Max. : 6.47235 Max. : 1.60944
d_o_food_bev_count
Min. :-0.01005
1st Qu.:-0.01005
Median :-0.01005
Mean : 0.32654
3rd Qu.: 0.00000
Max. : 4.70048
Spatial Interaction Modelling
SIM is a mathematical model predicting the movement of people between origins (like homes) and destinations by examining the distance between them.
In a healthcare context, a SIM can account the likely demand for health services and the quality of service provision at health centres. Conventionally, SIM’s cousin term is gravity model.
Variable Construction
# Generate propulsive variables names
origin_var <- propulsive %>%
select(-(index)) %>%
names()
# Generate attractiveness variables names
destin_var <- attractiveness %>%
select(-(index)) %>%
names()Origin Constrained Model
# Generate the formula dynamically
formula_string <- paste("TOTAL_TRIPS ~ ORIGIN_hex +",
paste(destin_var, collapse = " + "), "+ dist - 1")
# Convert the string to a formula
formula <- as.formula(formula_string)
orcSIM <- glm(formula,
family = poisson(link = "log"),
data = flow_data_log,
na.action = na.exclude)
write_rds(orcSIM, "data/rds/decSIM.rds")orcSIM <- read_rds("data/rds/orcSIM.rds")
summary(orcSIM)Destination Constrained Model
# Generate the formula dynamically
formula_string <- paste("TOTAL_TRIPS ~ DESTIN_hex +",
paste(destin_var, collapse = " + "), "+ dist - 1")
# Convert the string to a formula
formula <- as.formula(formula_string)
decSIM <- glm(formula,
family = poisson(link = "log"),
data = flow_data_log,
na.action = na.exclude)
write_rds(decSIM, "data/rds/decSIM.rds")decSIM <- read_rds("data/rds/decSIM.rds")
summary(decSIM)Doubly Constrained Model
dbcSIM <- glm(formula = TOTAL_TRIPS ~
ORIGIN_hex +
DESTIN_hex +
dist,
family = poisson(link = "log"),
data = flow_data_log,
na.action = na.exclude)
write_rds(dbcSIM, "data/rds/dbcSIM.rds")dbcSIM <- read_rds("data/rds/decSIM.rds")
summary(dbcSIM)Prototype for Exploratory Spatial Data Analysis
In light of our class project, I aim to test the feasibility of plotting out the various commuting flow maps.
Specifically, the maps can focus on the various areas of interest and plot the flow lines originating and ending in AOIs of ranged densities.
The analytical unit could be planning area, subzone or even hexagonal.
Before diving into these analytical units, we can have a look at the general inflow and outflows from origins and destinations.
Level 1: General Perspective
The TOTAL_TRIPS is set to 3000 at minimum due to computational time taken to run the below code chunks.
General Outflows from Origins (Propulsive)
Here, I assume that the user is interested in bus traffic flows and where buses typically leave.
flowLine_org <- filter(flowLine, ORIGIN_hex %in% hex_grid_bounded3$index)
org_hex <- filter(hex_grid_bounded3, index %in% flowLine_org$ORIGIN_hex)
org_hex_3000 <- org_hex[org_hex$index %in% flowLine_org$ORIGIN_hex[flowLine_org$TOTAL_TRIPS >= 3000], ]
tm_shape(hex_grid_bounded3) +
tm_borders(col = "grey") +
tm_shape(org_hex_3000) +
tm_fill(col = "skyblue", alpha = 0.5) +
tm_borders(col = "grey") +
flowLine_org %>%
filter(TOTAL_TRIPS >= 3000) %>%
tm_shape() +
tm_lines(lwd = "TOTAL_TRIPS",
scale = c(1,2,3,4,5,7,9),
n = 6,
alpha = 0.5,
title.lwd = "Total Trips"
) +
tm_layout(main.title = "Origin: Desire Lines Traffic Analysis Zones \n(Nov '23 - Jan '24)",
main.title.position = "center",
main.title.size = 0.6,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 
The potential user interface for the above scenario may look like this:
.png)
Table for Top Five Propulsive Hexagons (TBC)
Inflows to Destinations (Attractive)
This scenario assumes that the user is interested in the popular bus destinations, i.e. TOTAL_TRIPS is minmally 8000 on weekdays.
flowLine_dest <- filter(flowLine, DESTIN_hex %in% hex_grid_bounded3$index)
dest_hex <- filter(hex_grid_bounded3, index %in% flowLine_dest$DESTIN_hex)
dest_hex_8000 <- dest_hex[dest_hex$index %in% flowLine_dest$DESTIN_hex[flowLine_dest$TOTAL_TRIPS >= 10000], ]
tm_shape(hex_grid_bounded3) +
tm_borders(col = "grey") +
tm_shape(dest_hex_8000) +
tm_fill(col = "skyblue", alpha = 0.5) +
tm_borders(col = "grey") +
flowLine_dest %>%
filter(DAY_TYPE == "WEEKDAY", TOTAL_TRIPS >= 8000) %>%
tm_shape() +
tm_lines(lwd = "TOTAL_TRIPS",
scale = c(1,2,3,4,5,7,9),
n = 6,
alpha = 0.5,
title.lwd = "Total Trips"
) +
tm_layout(main.title = "Destinations: Desire Lines with at least 8000 trips \nbetween Traffic Analysis Zones for Weekdays \n(Nov '23 - Jan '24)",
main.title.position = "center",
main.title.size = 0.6,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 
The potential user interface for the above scenario may look like this:
.png)
Weekday vs Weekend/Holiday
This scenario simulates the user wanting to find out how do bus destinations differ on weekends compared to weekdays.
tm_shape(hex_grid_bounded3) +
tm_borders(col = "grey") +
tm_shape(dest_hex_8000) +
tm_fill(col = "skyblue", alpha = 0.5) +
tm_borders(col = "grey") +
flowLine_dest %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY", TOTAL_TRIPS >= 8000) %>%
tm_shape() +
tm_lines(lwd = "TOTAL_TRIPS",
scale = c(1,2,3,4,5,7,9),
n = 6,
alpha = 0.5,
title.lwd = "Total Trips"
) +
tm_layout(main.title = "Destinations: Desire Lines with at least 8000 trips \nbetween Traffic Analysis Zones for Weekends \n(Nov '23 - Jan '24)",
main.title.position = "center",
main.title.size = 0.6,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 
Hexagons linked to desire lines reveal that they are more popular starting or destination locations for bus rides.
There are many trips made to destinations in the north, which is around Tuas Checkpoint.
The potential user interface for the above scenario may look like this:
.png)
Specific Hour of the Day
A user might be curious about bus traffic flows and where they start on weekdays at 8am.
flowLine_org <- filter(flowLine, ORIGIN_hex %in% hex_grid_bounded3$index)
org_hex <- filter(hex_grid_bounded3, index %in% flowLine_org$ORIGIN_hex)
org_hex_3000 <- org_hex[org_hex$index %in% flowLine_org$ORIGIN_hex[flowLine_org$TOTAL_TRIPS >= 3000], ]
tm_shape(hex_grid_bounded3) +
tm_borders(col = "grey") +
tm_shape(org_hex_3000) +
tm_fill(col = "skyblue", alpha = 0.5) +
tm_borders(col = "grey") +
flowLine_org %>%
filter(TOTAL_TRIPS >= 3000, TIME_PER_HOUR == 8) %>%
tm_shape() +
tm_lines(lwd = "TOTAL_TRIPS",
scale = c(1,2,3,4,5,7,9),
n = 6,
alpha = 0.5,
title.lwd = "Total Trips"
) +
tm_layout(main.title = "Origin: Desire Lines Traffic Analysis Zones Weekdays, 8am \n(Nov '23 - Jan '24)",
main.title.position = "center",
main.title.size = 0.6,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 
The potential user interface for the above scenario may look like this:
.png)
Since too many desire lines are plotted, the user can narrow the minimum number of total trips per origin-destination hexagon pair to 10000.
flowLine_org <- filter(flowLine, ORIGIN_hex %in% hex_grid_bounded3$index)
org_hex <- filter(hex_grid_bounded3, index %in% flowLine_org$ORIGIN_hex)
org_hex_10000 <- org_hex[org_hex$index %in% flowLine_org$ORIGIN_hex[flowLine_org$TOTAL_TRIPS >= 10000], ]
tm_shape(hex_grid_bounded3) +
tm_borders(col = "grey") +
tm_shape(org_hex_10000) +
tm_fill(col = "skyblue", alpha = 0.5) +
tm_borders(col = "grey") +
flowLine_org %>%
filter(TOTAL_TRIPS >= 10000, TIME_PER_HOUR == 8) %>%
tm_shape() +
tm_lines(lwd = "TOTAL_TRIPS",
scale = c(1,2,3,4,5,7,9),
n = 6,
alpha = 0.5,
title.lwd = "Total Trips"
) +
tm_layout(main.title = "Origin: Desire Lines between Traffic Analysis Zones with >10000 Trips \n(Nov '23 - Jan '24)",
main.title.position = "center",
main.title.size = 0.6,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 
The potential user interface for the above scenario may look like this:
.png)
Level 2: Planning Area
The below scenario is whereby a user is interested in looking at bus trips starting from a Planning Area like Bedok at 8am on a weekday.
tmap_mode("view")
flowLine_org <- filter(flowLine, ORIGIN_hex %in% hex_grid_bounded3$index)
org_hex <- filter(hex_grid_bounded3, index %in% flowLine_org$ORIGIN_hex)
org_hex_3000 <- org_hex[org_hex$index %in% flowLine_org$ORIGIN_hex[flowLine_org$TOTAL_TRIPS >= 3000], ]
tm_shape(hex_grid_bounded3) +
tm_borders(col = "grey") +
org_hex_3000 %>%
filter(PLN_AREA_N == "BEDOK") %>%
tm_shape() +
tm_fill(col = "skyblue", alpha = 0.5) +
tm_borders(col = "grey") +
flowLine_org %>%
filter(ORIGIN_hex %in% org_hex_3000$index[org_hex_3000$PLN_AREA_N == "BEDOK"]) %>%
filter(TOTAL_TRIPS >= 3000, TIME_PER_HOUR == 8) %>%
tm_shape() +
tm_lines(lwd = "TOTAL_TRIPS",
scale = c(1,2,3,4,5,7,9),
n = 6,
alpha = 0.5,
title.lwd = "Total Trips"
) +
tm_layout(main.title = "Bedok Origin: Desire Lines between Traffic Analysis Zones with >3000 Trips \n(Nov '23 - Jan '24)",
main.title.position = "center",
main.title.size = 0.6,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) - The majority of bus trips at 8 am starting from Bedok Planning Area mostly end within the Planning Area too.
The potential user interface for the above scenario may look like this:
.png)
Multiple starting planning areas for bus trips can also be observed.
tmap_mode("view")
flowLine_org <- filter(flowLine, ORIGIN_hex %in% hex_grid_bounded3$index)
org_hex <- filter(hex_grid_bounded3, index %in% flowLine_org$ORIGIN_hex)
org_hex_3000 <- org_hex[org_hex$index %in% flowLine_org$ORIGIN_hex[flowLine_org$TOTAL_TRIPS >= 3000], ]
tm_shape(hex_grid_bounded3) +
tm_borders(col = "grey") +
org_hex_3000 %>%
filter(PLN_AREA_N %in% c("BEDOK", "PUNGGOL", "PAYA LEBAR")) %>%
tm_shape() +
tm_fill(col = "PLN_AREA_N", palette = "Dark2", alpha = 0.5) +
tm_borders(col = "grey") +
flowLine_org %>%
filter(ORIGIN_hex %in% org_hex_3000$index[org_hex_3000$PLN_AREA_N %in% c("BEDOK", "PUNGGOL", "PAYA LEBAR")]) %>%
filter(TOTAL_TRIPS >= 3000, TIME_PER_HOUR == 8) %>%
left_join(org_hex_3000 %>% distinct(index, PLN_AREA_N), by = c("ORIGIN_hex" = "index")) %>%
tm_shape() +
tm_lines(lwd = "TOTAL_TRIPS",
col = "PLN_AREA_N",
palette = "Dark2",
scale = c(1, 2, 3, 4, 5, 7, 9),
alpha = 0.5
) +
tm_layout(main.title = "Bedok, Paya Lebar & Punggol Origin: Desire Lines between \nTraffic Analysis Zones with >3000 Trips (Nov '23 - Jan '24)",
main.title.position = "center",
main.title.size = 0.6,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) +
tm_view(set.zoom.limits = c(12,30))There are more origin locations at Bedok than Punggol at 8am.
There are noticeable outgoing bus trips from Punggol to Bedok Planning Area at 8am.
The potential user interface for the above scenario may look like this:
.png)
Level 3: Subzone
The user might also want to view bus travels at the subzone level, such as those starting from subzone Mathilda in Punggol planning area.
tmap_mode("plot")
flowLine_org <- filter(flowLine, ORIGIN_hex %in% hex_grid_bounded3$index)
org_hex <- filter(hex_grid_bounded3, index %in% flowLine_org$ORIGIN_hex)
org_hex_3000 <- org_hex[org_hex$index %in% flowLine_org$ORIGIN_hex[flowLine_org$TOTAL_TRIPS >= 3000], ]
tm_shape(hex_grid_bounded3) +
tm_borders(col = "grey") +
org_hex_3000 %>%
filter(SUBZONE_N == "MATILDA") %>%
tm_shape() +
tm_fill(col = "skyblue", alpha = 0.5) +
tm_borders(col = "grey") +
flowLine_org %>%
filter(ORIGIN_hex %in% org_hex_3000$index[org_hex_3000$SUBZONE_N == "MATILDA"]) %>%
filter(TOTAL_TRIPS >= 3000, TIME_PER_HOUR == 7) %>%
tm_shape() +
tm_lines(lwd = "TOTAL_TRIPS",
scale = c(1,2,3,4,5,7,9),
n = 6,
alpha = 0.5,
title.lwd = "Total Trips"
) +
tm_layout(main.title = "Punggol Matilda Origin: Desire Lines between Traffic Analysis Zones with >3000 Trips \n(Nov '23 - Jan '24)",
main.title.position = "center",
main.title.size = 0.6,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) 
- Between 7am and 8am, there are many trips from Matilda subzone to its immediate east hexagon neighbour that is also within the Punggol Planning Area.
Town Councils have recently been pushing for silver zones which promotes elderly-friendly paths along with school zones. Traffic calming in such zones may affect bus routes and it is important to check the volume of inter Planning Area bus routes. Planners can use the multiple-subzones perspective as a proxy of the potential economic costs of constructing such zones:
tmap_mode("view")
flowLine_org <- filter(flowLine, ORIGIN_hex %in% hex_grid_bounded3$index)
org_hex <- filter(hex_grid_bounded3, index %in% flowLine_org$ORIGIN_hex)
org_hex_3000 <- org_hex[org_hex$index %in% flowLine_org$ORIGIN_hex[flowLine_org$TOTAL_TRIPS >= 3000], ]
org_hex_3000 <- org_hex_3000[, c("index", setdiff(names(org_hex_3000), "index"))]
hex_grid_bounded3 <- hex_grid_bounded3[, c("index", setdiff(names(hex_grid_bounded3), "index"))]
tm_shape(hex_grid_bounded3) +
tm_borders(col = "grey") +
tm_fill(col = "lightgrey", alpha = 0.2) +
org_hex_3000 %>%
filter(SUBZONE_N %in% c("WATERWAY EAST", "NORTHSHORE", "MATILDA", "PUNGGOL FIELD", "CONEY ISLAND")) %>%
tm_shape() +
tm_fill(col = "SUBZONE_N", palette = "Dark2", alpha = 0.5) +
tm_borders(col = "grey") +
flowLine_org %>%
filter(ORIGIN_hex %in% org_hex_3000$index[org_hex_3000$SUBZONE_N %in% c("WATERWAY EAST", "NORTHSHORE", "MATILDA", "PUNGGOL FIELD", "CONEY ISLAND")]) %>%
filter(DAY_TYPE == "WEEKDAY",TOTAL_TRIPS >= 3000, TIME_PER_HOUR == 7) %>%
left_join(org_hex_3000 %>% distinct(index, SUBZONE_N), by = c("ORIGIN_hex" = "index")) %>%
tm_shape() +
tm_lines(lwd = "TOTAL_TRIPS",
col = "SUBZONE_N",
palette = "Dark2",
scale = c(1, 2, 3, 4, 5, 7, 9),
alpha = 0.5
) +
tm_layout(main.title = "Punggol Subzones Origin: Desire Lines between \nTraffic Analysis Zones with >3000 Trips (Nov '23 - Jan '24)",
main.title.position = "center",
main.title.size = 0.6,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) +
tm_view(set.zoom.limits = c(12,20))On weekdays, 7-8am, there has been a high total bus traffic of 29,460 trips from Matilda (hex 1368) to Punggol Field (hex 1390), and 29,178 trips from Punggol Field (hex 1390) to Waterway East (hex 1412).
There is also high bus traffic of 14,581 trips within Punggol Field subzone, from hex 1390 to hex 1400. Should there be construction plans in these hexagonal areas, the relevant government agencies or even the town council can advice residents on alternative routes to go to their respective destinations.
The potential user interface for the above scenario may look like this:
.png)
Validating Spatial Interaction Models
As a teaser to SIMs, users can have a look at whether the volume of bus trips could be influenced by areas of interests within the vicinity.
From the previous map, we can see that there is a total high bus traffic of 14,581 from Punggol Field hex 1400 to Bedok North (hex 1515) from 7-8am on weekdays.
tmap_mode("view")
tm_shape(hex_grid_bounded3) +
tm_borders(col = "grey") +
org_hex_3000 %>%
filter(SUBZONE_N %in% c("WATERWAY EAST", "NORTHSHORE", "MATILDA", "PUNGGOL FIELD", "CONEY ISLAND")) %>%
tm_shape() +
tm_fill(col = "SUBZONE_N", palette = "Dark2", alpha = 0.5) +
tm_borders(col = "grey") +
flowLine_org %>%
filter(ORIGIN_hex %in% org_hex_3000$index[org_hex_3000$SUBZONE_N %in% c("WATERWAY EAST", "NORTHSHORE", "MATILDA", "PUNGGOL FIELD", "CONEY ISLAND")]) %>%
filter(TOTAL_TRIPS >= 3000, TIME_PER_HOUR == 7) %>%
left_join(org_hex_3000 %>% distinct(index, SUBZONE_N), by = c("ORIGIN_hex" = "index")) %>%
tm_shape() +
tm_lines(lwd = "TOTAL_TRIPS",
col = "SUBZONE_N",
palette = "Dark2",
scale = c(1, 2, 3, 4, 5, 7, 9),
alpha = 0.5
) +
tm_shape(biz) +
tm_dots(col = "blue") +
tm_layout(main.title = "Punggol Subzones Origin: Desire Lines between \nTraffic Analysis Zones with >3000 Trips (Nov '23 - Jan '24)",
main.title.position = "center",
main.title.size = 0.6,
frame = TRUE) +
tm_legend(position = c("RIGHT", "BOTTOM"), legend.text.size = 0.4) +
tm_view(set.zoom.limits = c(12,20))tmap_mode("plot")- Perhaps on weekdays 8-9am, Bedok North hex 1515 is a popular destination for the population living in Punggol Field hex 1400 due to the numerous businesses in the former hex which is also an industrial area.
.png)
Follow-up Actions
The space for the commute flow maps could be further utilised with a data table showing the top statistics based on the features the user has selected.